# TENet: A Text-Enhanced Network for Few-Shot Semantic Segmentation with Background-Aware Query Refinement


> **Abstract:** *Existing few-shot semantic segmentation (FSS) methods suffer from limited annotation data and domain gaps between support and query images. Although recent multi-modal approaches incorporate textual information to mitigate this gap, they primarily focus on visual features and foreground text, ignoring the value of background semantics.
However, the background context plays a crucial role in reasoning. Its semantic association with the foreground helps the model to better distinguish the target.
Motivated by this, we propose a Text Enhancement Network, called TENet, which is a novel FSS framework that uses both foreground and background text to generate high-quality activation maps for query features. The proposed TENet adaptively generates background text from the foreground semantics by integrating a DeepSeek-based activation generation module. The background text is encoded using a frozen CLIP encoder and fused with visual features to generate refined activation maps. To further improve alignment precision, we propose a joint optimization strategy by combining dynamic and fixed refinement methods.
Extensive experiments on PASCAL-5$^i$ and COCO-20$^i$ demonstrate that the proposed TENet consistently outperforms state-of-the-art methods, validating the effectiveness of incorporating background text and refined activation mechanisms in FSS.*


## Get Started

### Environment

- python == 3.8.20
- torch == 1.11.0
- torchvision == 0.12.0
- opencv-python == 4.10.0.84
- numpy == 1.24.3
- mmcv-full == 1.6.2
- mmsegmentation == 0.27.0  



### Dataset
Please download the following datasets and put them into the `../data` directory.:

+ PASCAL-5<sup>i</sup>: [**PASCAL VOC 2012**](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/) and [**SBD**](http://home.bharathh.info/pubs/codes/SBD/download.html)

+ COCO-20<sup>i</sup>: [**COCO 2014**](https://cocodataset.org/#download).

The lists generation are followed [PFENet](https://github.com/dvlab-research/PFENet). You can direct download and put them into the `./lists` directory.

We have adopted the same procedures as [PFE-Net](https://github.com/dvlab-research/PFENet) and [HDMNet](https://github.com/Pbihao/HDMNet) for the pre-trained backbones, placing them in the `../initmodel` directory. 

Download CLIP pre-trained ViT-B/16 at [**here**](https://openaipublic.azureedge.net/clip/models/5806e77cd80f8b59890b7e101eabd078d9fb84e6937f9e85e4ecb61988df416f/ViT-B-16.pt) and put it to `../initmodel/clip`


## Scripts
- First update the configurations in the `./config` for training(coming soon) or testing

- Test script
```
sh test.sh [exp_name] [dataset] [GPUs]

# Example (split0 | COCO dataset | 1 GPU for testing):
# sh test.sh split0 coco 1
```

## References

This repository owes its existence to the exceptional contributions of other projects:
* PI-CLIP: https://github.com/vangjin/PI-CLIP
* PFENet: https://github.com/dvlab-research/PFENet
* BAM: https://github.com/chunbolang/BAM
* HDMNet: https://github.com/Pbihao/HDMNet

Many thanks for their excellent work.



